Exact performance of error estimators for discrete classifiers
نویسندگان
چکیده
Discrete Classification problems abound in pattern recognition and data mining applications. One of the most common discrete rules is the discrete histogram rule. This paper presents exact formulas for the computation of bias, variance, and RMS of the resubstitution and leave-one-out error estimators, for the discrete histogram rule. We also describe an algorithm to compute the exact probability distribution of resubstitution and leave-one-out, as well as ther deviations from the true error rate. Using a parametric Zipf model, we compute the exact performance of resubstitution and leave-one-out, for varying expected true error, number of samples, and classifier complexity (number of bins). We compare this to approximate performance measures — computed by Monte-Carlo sampling — of 10-repeated 4-fold cross-validation and the 0.632 bootstrap error estimator. Our results show that resubstitution is low-biased but much less variable than leave-one-out, and is effectively the superior error estimator between the two, provided classifier complexity is low. In addition, our results indicate that the overall performance of resubstitution, as measured by the RMS, can be substantially better than the 10-repeated 4fold cross-validation estimator, and even comparable to the 0.632 bootstrap estimator, provided that classifier complexity is low and the expected error rates are moderate. In addition to the results discussed in the paper, we provide an extensive set of plots that can be accessed on a companion website, at the URL http://ee.tamu.edu/∼edward/exact discrete.
منابع مشابه
The Ratio-type Estimators of Variance with Minimum Average Square Error
The ratio-type estimators have been introduced for estimating the mean and total population, but in recent years based on the ratio methods several estimators for population variance have been proposed. In this paper two families of estimators have been suggested and their approximation mean square error (MSE) have been developed. In addition, the efficiency of these variance estimators are com...
متن کاملExact Performance of CoD Estimators in Discrete Prediction
The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. We study several CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators. We present an exact formulation of performance metrics for the resubstitution and leave-one-out CoD estimators, assuming the dis...
متن کاملOn optimal Bayesian classification and risk estimation under multiple classes
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk func...
متن کاملOn the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers
Error estimation is a problem of high current interest in many areas of application. This paper concerns the classical problem of determining the performance of error estimators in small-sample settings under a Gaussianity parametric assumption. We provide here for the first time the exact sampling distribution of the resubstitution and leave-one-out error estimators for linear discriminant ana...
متن کاملOn Mathematical Characteristics of some Improved Estimators of the Mean and Variance Components in Elliptically Contoured Models
In this paper we treat a general form of location model. It is typically assumed that the error term is distributed according to the law belonging to the class of elliptically contoured distribution. Some sorts of shrinkage estimators of location and scale parameters are proposed and their exact bias and MSE expressions are derived. The performance of the estimators under study are compl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 38 شماره
صفحات -
تاریخ انتشار 2005